Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make murmurhash3_x64_128 compatible with existing cuco data structures #501

Merged
merged 6 commits into from
Jun 19, 2024

Conversation

srinivasyadav18
Copy link
Contributor

@srinivasyadav18 srinivasyadav18 commented Jun 5, 2024

Make murmurhash3_x64_128 compatible with existing cuco data structures

  • Moves sanitize_hash function from cuco::detail:: namespace to probing_scheme_base class as protected memeber.
  • Modifies the santize_hash function to handle cuda::std::array<uint64_t, 2>, which is returned from murmurhash3_x64_128 hash function.
  • Adds new hash_test.cu to test static_map API with all hash functions.

Copy link

copy-pr-bot bot commented Jun 5, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@srinivasyadav18 srinivasyadav18 force-pushed the murmur128_compatibility branch from 6af4877 to 2001837 Compare June 6, 2024 21:45
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor cleanups.

Based on the offline discussions, we are not happy with the fact that sanitize_hash has to be invoked twice in CG-based probing. @sleeepyjack Any idea how to improve the situation?

include/cuco/detail/probing_scheme_base.cuh Outdated Show resolved Hide resolved
include/cuco/detail/probing_scheme_base.cuh Outdated Show resolved Hide resolved
include/cuco/detail/probing_scheme_impl.inl Outdated Show resolved Hide resolved
include/cuco/detail/utils.cuh Show resolved Hide resolved
@PointKernel
Copy link
Member

/ok to test

@srinivasyadav18 srinivasyadav18 force-pushed the murmur128_compatibility branch from 469e39b to 2001837 Compare June 6, 2024 22:57
@sleeepyjack
Copy link
Collaborator

sleeepyjack commented Jun 8, 2024

I have some high-level questions/suggestions regarding this PR:

  • Why do we need to move the sanitize_hash function to the probing_scheme_base?
  • In my opinion, there is no real world use in having a size type larger than sizeof(size_t). So why not simply truncate the hash value to that max bit width and then call sanitize_hash on it?
  • The sanitize_hash function still has some unresolved problems (see draft PR Fix hash to size conversion #362) which I haven't been able to fix yet. Any chance we can also get this resolved in this PR? Should be a small but fun challenge @srinivasyadav18 ;)
  • nit: GH regularly gets confused and loses the discussion history when rebasing+force-pushing to the PR branch. We're fine with just adding or merging on top of the current HEAD...we will squash-merge the branch anyway so the intermediate commits will not be visible in the default branch.

@PointKernel
Copy link
Member

PointKernel commented Jun 8, 2024

Why do we need to move the sanitize_hash function to the probing_scheme_base?

I suggested to do so since probing is the only place using sanitize_hash.

So why not simply truncate the hash value to that max bit width and then call sanitize_hash on it?

It works but technically we are not using the hash value in a proper way, is it fair to say so?

@sleeepyjack
Copy link
Collaborator

sleeepyjack commented Jun 8, 2024

I suggested to do so since probing is the only place using sanitize_hash.

And I agree that it makes sense to put utils with the class where they have one-time use. However, since it's a template function the syntax doesn't get much cleaner doing so: probing_scheme_base::template sanitize_hash<SizeType>(...) vs. cuco::detail::sanitize_hash<SizeType>(...).

It works but technically we are not using the hash value in a proper way, is it fair to say so?

static_cast<size_t>(some_uint128) will also just truncate the upper bits so I'd say it's valid. We may want to add a static_assert to our extent type to limit the bit width to 64.

include/cuco/detail/utils.cuh Outdated Show resolved Hide resolved
template <typename SizeType, typename HashType>
__host__ __device__ constexpr SizeType sanitize_hash(HashType hash, std::uint32_t cg_rank) noexcept
{
return sanitize_hash<SizeType>(sanitize_hash<SizeType>(hash) + cg_rank);
Copy link
Collaborator

@sleeepyjack sleeepyjack Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There exists a scenario where this approach fails and it's the reason CI in my initial draft PR failed.
Consider the following example:

  • SizeType is int32_t aka a signed type
  • The value of sanitize_hash<SizeType>(hash) is very close to numeric_limits<SizeType>::max()

In this scenario, if we compute sanitize_hash<SizeType>(hash) + cg_rank there's chance the result oxceeds numeric_limits<SizeType>::max() which would result in a signed integer overflow which is undefined behavior under the C++ abstract machine. Thus the compiler is free to produce any garbage code around this call.

To solve this we need check if sanitize_hash<SizeType>(hash) > (numeric_limits<SizeType>::max() - group.size()) (be careful with > and >=, I'm infamous for my off-by-one errors) and then compute the wrapped-around value manually in case this expression evaluates to true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation!
I think 3ae1afe covers this case now.

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some styling nits otheriwse good to go

include/cuco/detail/utils.cuh Outdated Show resolved Hide resolved
include/cuco/detail/probing_scheme_base.cuh Outdated Show resolved Hide resolved
@PointKernel
Copy link
Member

/ok to test

@PointKernel PointKernel requested a review from sleeepyjack June 18, 2024 17:42
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work as always! Thanks!

return detail::probing_iterator<Extent>{
cuco::detail::sanitize_hash<size_type>(hash_(probe_key) + g.thread_rank()) % upper_bound,
cuco::detail::sanitize_hash<cg_type, size_type>(g, hash_(probe_key)) % upper_bound,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the size_type to the front of the tparam list so you don't have to specify the cg_type as it can be inferred from g.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about the intention of the API and the syntax consistency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the g parameter should be the first one for consistency reasons but we can still use a different ordering for the tparam list, i.e., the one that lets us make use of automatic type inference. The only tparam that cannot be inferred is the result size type so specifying the CG type is redundant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an internal API so I don't want to bikeshed too much. I'm okay with merging it as is.

Copy link
Collaborator

@sleeepyjack sleeepyjack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🚀 🚀

@sleeepyjack
Copy link
Collaborator

/ok to test

@sleeepyjack sleeepyjack merged commit c63ac89 into NVIDIA:dev Jun 19, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants